Characterization and engineering of the type 3 secretion system needle monomer from Salmonella through the construction and screening of a comprehensive mutagenesis library

ABSTRACT Protein production strategies in bacteria are often limited due to the need for cell lysis and complicated purification schemes. To avoid these challenges, researchers have developed bacterial strains capable of secreting heterologous protein products outside the cell, but secretion titers often remain too low for commercial applicability. Improved understanding of the link between secretion system structure and its secretory abilities can help overcome the barrier to engineering higher secretion titers. Here, we investigated this link with the PrgI protein, the monomer of the secretory channel of the type 3 secretion system (T3SS) of Salmonella enterica. Despite detailed knowledge of the PrgI needle’s assembly and structure, little is known about how its structure influences its secretory capabilities. To study this, we recently constructed a comprehensive codon mutagenesis library of the PrgI protein utilizing a novel one-pot recombineering approach. We then screened this library for functional T3SS assembly and secretion titer by measuring the secretion of alkaline phosphatase using a high-throughput activity assay. This allowed us to construct a first-of-its-kind secretion fitness landscape to characterize the PrgI needle’s mutability at each position as well as the mutations which lead to enhanced T3SS secretion. We discovered new design rules for building a functional T3SS as well as identified hypersecreting mutants. This work can be used to increase understanding of the T3SS’s assembly and identify further targets for engineering. This work also provides a blueprint for future efforts to engineer other complex protein assemblies through the construction of fitness landscapes. IMPORTANCE Protein secretion offers a simplified alternative method for protein purification from bacterial hosts. However, the current state-of-the-art methods for protein secretion in bacteria are still hindered by low yields relative to traditional protein purification strategies. Engineers are now seeking strategies to enhance protein secretion titers from bacterial hosts, often through genetic manipulations. In this study, we demonstrate that protein engineering strategies focused on altering the secretion apparatus can be a fruitful avenue toward this goal. Specifically, this study focuses on how changes to the PrgI needle protein from the type 3 secretion system from Salmonella enterica can impact secretion titer. We demonstrate that this complex is amenable to comprehensive mutagenesis studies and that this can yield both PrgI variants with increased secretory capabilities and insight into the normal functioning of the type 3 secretion system.

protein expression methods are intracellular, however, and downstream processing requires lysis and recovery of the protein product from a biochemically simi lar milieu.Recovery of proteins expressed intracellularly in bacteria is often further complicated by low soluble yields caused by host toxicity or accumulation of the protein product in insoluble aggregates (1,2).Secreting the protein out of the cell has the potential to alleviate these problems while retaining the benefits of bacterial protein production.At least five types of bacterial secretion systems have been shown to secrete recombinant proteins outside the cell, though commercial feasibility remains out of reach due to limitations on titer, substrate compatibility, and efficiency (3)(4)(5)(6).
The type 3 secretion system (T3SS) transports proteins directly from the cytoplasm to the extracellular space and is thus a promising platform for protein secretion in bacteria.It is not required for cellular viability, which allows it to be co-opted solely for heterolo gous protein production.The T3SS is capable of secreting a wide variety of recombinant proteins (7)(8)(9).However, the system remains unoptimized for secreting products at industrially viable titers, and the engineering space remains largely unexplored.The structure of the T3SS is well-studied and plays an integral part in its function (Fig. 1).However, little is known about how the structure of the T3SS influences its secretory capability.Learning how to manipulate the apparatus to improve secretion efficiency for recombinant proteins is key to developing the T3SS as a platform for protein production.
The T3SS is a large (3.5 MDa) protein complex that spans the inner and outer membranes of the bacterial cell (12, 13) (Fig. 1).A series of rings embedded in the inner and outer membranes comprise the basal body (Fig. 1).An oligomeric tunnel, FIG 1 The type 3 secretion system needle is composed of assembled PrgI monomers.(Left) Schematic representation of the type 3 secretion system, which is a large protein complex composed of a basal body that spans the inner and outer cell membranes as well as the periplasm (10).The needle (teal) is composed of assembled PrgI monomers (right) that extend into the extracellular space.The needle is capped by a tip complex composed of the SipD protein.Proteins are secreted from the inside of the cell to the extracellular space through the lumen in the needle complex.PrgI monomers are from the PDB structure 6dwb (11).rooted in the basal body and composed of about a hundred copies of a single protein, PrgI, extends into the extracellular space (Fig. 1).The tunnel, or needle, is capped by a tip complex (Fig. 1).A detailed mechanism of protein translocation through the needle lumen remains unknown, but variations in the proteins that compose the T3SS apparatus, including PrgI, affect its native secretion function (14)(15)(16)(17).
We sought to probe an under-explored area: the impact of variations in T3SS apparatus proteins on secretion.To do so, we chose to use deep mutational scanning and protein fitness landscapes, which are powerful tools for defining protein structurefunction relationships and identifying candidates with improved characteristics (18).This technique is frequently applied to monomeric or small protein complexes with a direct functional output that is readily assayed.Establishing a fitness landscape for self-assem bling proteins can be more challenging, as any functional screen or selection reports on both assembly and the functional output.
We used this combined output to our advantage, however, and adapted the recently developed SyMAPS method (19) to produce a "secretion fitness landscape" (SFL) of the Salmonella enterica Typhimurium T3SS needle protein PrgI.Briefly, we created a first-of-its-kind genomically integrated comprehensive codon mutagenesis (CCM) library and ranked the library members by secreted protein titer.Next generation sequencing of the ranked variants enabled construction of an SFL.The SFL revealed design rules for a functional T3SS needle and provides a blueprint for future engineering targets for increased secretion titer.Through this process, we also were able to identify and confirm new hypersecreting variants.Beyond this immediate application, the work provides a blueprint for the use of comprehensive library design and high-throughput screening to characterize and engineer other multimeric protein structures, including other T3SS proteins as well as the components of other secretion systems.

Validating the construction and screening of a genomically encoded PrgI variant library
PrgI and several of its homologs have been analyzed through alanine scanning and targeted substitution (14,15,20,21).Single amino acid changes produced variable secretion phenotypes, so we hypothesized that a combined CCM and library screening approach could reveal hypersecreting variants of PrgI.To accomplish this, we validated a genomic library construction and screening strategy.Salmonella pathogenicity island 1 (SPI-1) T3SS assembly and activation is a tightly controlled, highly orchestrated process (12,22), and SPI-1 genes have overlapping regulatory elements, particularly in the prg operon where prgI is located (23,24).Thus, we sought to construct a genomically encoded library to study only the effect of structural changes in PrgI by maintaining the native regulatory structure and induction cascade.
We chose to employ λ Red recombineering (25) to create the library because it enables construction in a single step.To our knowledge, this technique has not been used in library construction before we carried out this study, so we performed a pilot test by creating a saturation mutagenesis library at only PrgI position 41 to validate both the library construction and screening methods (Fig. 2A).PrgI P41 was of interest because an alanine substitution at the homologous position in MxiH, the needle protein for the Shigella flexneri T3SS injectosome, increased secretion of native substrates (15).The template strain for library construction contained a copy of alkaline phosphatase fused to the T3SS secretion tag SptP integrated at the sptP locus, providing a reporter for secretion titer through use of an enzyme activity assay (Fig. 2A) (26).It also contained selectable markers at the prgI locus, to be replaced by the prgI variants (Fig. 2A).An equimolar pool of all 20 fragments of the saturation mutagenesis library was introduced into this sptP::sptP 1-167 -phoA prgI::catG-sacB strain using a single λ Red recombineering event, and 68 of the resulting clones were Sanger sequenced (Fig. 2A).Of those 68 clones, 88% contained prgI alleles that successfully replaced the catG-sacB cassette, and all variants were present except PrgI P41C (Fig. S1).The PrgI P41 saturation mutagenesis library produced variable secretion titers, and several variants increased secretion titer compared to wild-type PrgI (PrgI WT ) (Fig. 2B).We used the initial rate of secreted alkaline phosphatase activity as a proxy for secreted enzyme titer (27), and all variants were normalized to PrgI WT (Fig. 2).Relative secretion titers as assessed by alkaline phosphatase activity generally agreed with western blotting results (Fig. 2B).The high rate of substitution at the PrgI locus and the high coverage of the PrgI P41 library gave us confidence that we could construct and screen a full PrgI library, and the variable secretion titers convinced us of the utility of such a screen.

Generating a secretion fitness landscape of PrgI
With the genomically encoded library construction and screening methods validated, we constructed the full CCM library of PrgI.We created the library in a single λ Red recombineering event using the sptP::sptP 1-167 -phoA prgI::catG-sacB strain and an equimolar pool of synthetic, double-stranded prgI gene fragments, each containing a single amino acid change (Fig. 3).The first and last six amino acids were not changed because they are essential for assembly (15,20) and wild-type (WT) residues were not present in the library.Because secretion titer is a phenotype necessarily separate from genotype (the secreted protein is physically separated from the cell and its associated genetic information), clones were screened individually in 96-well plates (Fig. 3).Glycerol stocks were maintained in the same array as the samples measured for secretion titer to provide the link between genotype and secretion titer (Fig. 3).We screened 4,406 clones to capture threefold the variant library size with a cushion for false positives from the recombineering process.Secretion titer, as measured by the initial rate of secreted alkaline phosphatase activity, was normalized to that of PrgI WT to allow comparison across assay plates.The unscreened library contained 99% of expected variants, and the screened clones captured 90% of the library.After the initial screen, clones were re-arrayed according to relative secretion titer and sorted into 10 pools for high-through put sequencing (Fig. 3).
Dividing clones into pools according to relative secretion titer allowed assignment of a "secretion score" to each variant (Fig. 3).If variants appeared in multiple pools, a weighted-average secretion score was calculated using the relative abundance of the variant in each pool as weights.A window of likely pool appearances was defined by the alkaline phosphatase activity assay coefficient of variation, and scores outside of that window were not included in the weighted average calculation (see supplemental methods).This scoring led to construction of a quantitative "secretion fitness landscape" that reports on both functional substitutions and those that increased secretion titer (Fig. 4).WT secretion levels are expected to fall within scores 2-5, variants with scores less than 1 are expected to be non-functional, and variants with scores above 7 are anticipated to confer secretion titers higher than PrgI WT .

The SFL is experimentally validated and reflects known structural constraints
The synthesized library did not contain WT residues or stop codons, nor was any amino acid encoded with synonymous codons, so there were no controls within the SFL to assess validity of the method internally.Instead, we validated the SFL experimentally.All clones from the highest secreting pool (pool J) and a random selection of 30 additional clones distributed across the remaining pools (pools A-I) were Sanger sequenced and re-evaluated individually in triplicate using initial rate of secreted alkaline phosphatase the specified PrgI mutants in LB-L."Δ" is ASTE13 prgI::catsacB.Relative secretion titer was measured using densitometry from western blots and normalized to a PrgI WT strain.Western blots are representative of four biological replicates.Relative secretion titer was also measured by averaging the slope of absorbance at 405 nm versus time for four biological replicates and normalizing to PrgI WT .Absorbance at 405 nm was recorded every 5 minutes for 16 hours on a BioTek Synergy HTX plate reader at 37°C.Error bars represent standard error.Representative western blots of the secreted fraction (SF) and the whole culture lysate (WCL) are shown below.activity as a proxy for secretion titer (Fig. 5).Secretion titer increased with secretion score (R 2 = 0.87), and all variants in pool J showed secretion titers at least 50% greater than PrgI WT (Fig. 5).Combined, these results confirm the ability of the secretion score to predict general secretion behavior.
The SFL can also be validated by examining poorly substituted residues in the context of available structural information.To determine the overall mutability of each position in PrgI, an average secretion score was calculated by averaging the secretion scores of all variants at each position (Fig. 6, right).Table 1 lists all residues with an average secretion score <0.5, i.e., most or all variants were non-functional, making the position immutable.For each of these residues, we calculated conservation and buried surface area (BSA) (Table 1; Fig. 6, left and middle).Conservation is a measure of how similar the amino acid sequence is between protein homologs in different organisms.If a residue is highly conserved, it indicates that the position is likely crucial in the function of the protein and cannot be mutated.We calculated conservation from all members of Pfam group PF09392, which are T3SS needle homologs across enteric bacteria.BSA is a measure of how accessible a residue is to solvent in the context of the protein complex.Residues with a high BSA are likely crucial to the protein's structure and cannot be mutated without affecting protein folding or structure.Protein Data Bank in Europe-Proteins, Interfaces, Structures and Assemblies (PDBePISA) yielded BSA for each residue and each monomer in PDB 6dwb (28), and BSA was averaged across all monomers and normalized FIG 3 Workflow for PrgI library assembly, screening, and analysis.Gene blocks coding for a single amino acid change in PrgI were introduced to ASTE13 sptP::sptP (1-167) -phoA-2×FLAG-6×His prgI::catsacB as a mixture in a single recombineering event to create the library.Individual colonies were inoculated for secretion and screened for alkaline phosphatase activity as described in Materials and Methods.The randomly arrayed clones were sorted according to relative secretion titer, combined into their assigned pools, and prepared for sequencing on an Illumina MiSeq.
to the averaged accessible surface area per residue.For the poorly substituted residues listed in Table 1, most were buried within the structure, conserved, or both (Fig. 6), which suggests that the native amino acid at those positions plays an important role in the needle structure.Indeed, recently solved structures of the needle show that all residues in Table 1 participate in inter-and intra-subunit interaction networks that stabilize the needle filament or are essential for secretion activity (17,29,30).

Amino acid properties reveal substitution preferences
Examining the preference for 10 amino acid properties at each position condensed the SFL and revealed substitution preferences (Fig. 7).To do this, we grouped amino acids according to particular properties including volume, molecular weight, length, steric hindrance, polarity, polar area, fraction water, hydrophobicity, nonpolar area, and flexibility (Fig. 7).We averaged the secretion scores of these amino acid substitutions at each position to calculate a property secretion score.Substitutions that increased secretion titer were often large and hydrophobic (Fig. 7).The residues that accepted few functional substitutions, i.e., those with low average secretion scores, preferred no amino acid property (Fig. 7).Many residues, especially in the N-terminal helix (residues 7-37), had functional substitutions with all types of amino acids.In general, polar and flexible amino acids were disfavored, often together in a repeating pattern throughout PrgI (see the negative scores for polar substitutions at residues 23, 27, 31, etc.).The native amino acids in this pattern were hydrophobic except at positions R58 and T64.
This preferential property patterning, tendency of hydrophobic residues to increase secretion score, and the observation that buried and conserved residues were generally immutable (Table 1; Fig. 6) led us to hypothesize that conservation and BSA could serve as predictors of mutational tolerance.Additionally, we hypothesized that patterns in mutability would correspond to structural components of PrgI.To test this idea, we calculated conservation and BSA as before for all residues.We clustered positions according to secretion score to better discern patterns in the data.The clustering produced three groups that roughly corresponded to high, medium, and low average secretion scores (Fig. S2).Importantly, residues that were both highly conserved and buried had low mutational tolerance as expected, but that was the limit of the predictive power of conservation and BSA on positional mutation tolerance.Residues with both low BSA and low conservation allowed many functional substitutions, but few increased secretion titer above PrgI WT (Fig. S2).As expected from the prior observation of substitu tion preferences by property, the highest secreting PrgI variants (cluster in the top left corner of Fig. S2) were often large hydrophobic residues in the C-terminal helix.

The SFL reveals structural substitution patterns
PrgI is composed of two helices connected by a four-amino acid loop (Fig. 8A).The N-terminal helix is exposed to the exterior environment, while the C-terminal helix is packed into the structure and forms the needle interior (Fig. 8A).Mapping the average secretion score per residue onto the needle structure confirmed a pattern suggested by the SFL and clustering (Fig. 4 and 8)-many substitutions were allowed in the  N-terminal helix (needle exterior, Fig. 8B, left), but few increased secretion titer above PrgI WT .Conversely, few substitutions were functional in the C-terminal helix (needle lumen, Fig. 8B, right), but functional substitutions frequently increased secretion titer residues (Fig. 8B, dark teal substitutions).
The needle interior shows a stark substitution pattern, with alternating helical bands of well-substituted and poorly substituted residues (Fig. 8B and C).The band of poorly substituted residues includes those C-terminal residues excluded from the library (residues in gray, Fig. 8) because they are essential for needle assembly (15,20).Models of the PrgI needle depict the needle interior as a right-handed groove with alternating charged and hydrophobic residues forming the edges and lumen of the groove, respectively (30) (Fig. 8C).The raised, charged groove is highly conserved across all species with a T3SS (29, 30), so we were not surprised to discover that most modifications at those residues disrupted secretion.
Poorly substituted residues also occurred at the interfaces between helices and adjacent monomers (Fig. 8C).A common theme of poorly substituted residues, aside from degree of burial and conservation, was that the mutations that were tolerated were of similar character to the native residue (Fig. 7).This supports the hypothesis that these interfacial residues facilitate proper structural arrangement and packing of each monomer within the needle structure.
Of the 63 variants with secretion scores greater than 7, 55 were in the C-terminal helix.We were surprised to discover that half of those 55 favorable substitutions were present at residues 55, 59, 63, 73, and 74, which form or contact the hydrophobic groove in the needle lumen (Fig. 8C).The hydrophobic groove is fairly well-conserved (Fig. 6), so it was surprising to find that many mutations were not only tolerated but significantly increased secretion titer at these positions.Furthermore, the most favorable substitutions at those residues were larger and more hydrophobic amino acids, possibly indicating that a more hydrophobic (or "greasy") channel is favorable for improved secretion efficiency of some proteins (Fig. 4 and 7).

Beneficial mutations are additive with other secretion titer enhancements
Secretion titer via the T3SS is maximized in an optimal growth medium with plasmidbased expression of the secreted protein fused to the secretion tag SptP and overex pression of a T3SS master regulator, hilA (5,9).Thus, we sought to evaluate whether the beneficial PrgI mutations revealed by this study caused a general increase in secretion titer and were additive with those existing strategies.We selected four top clones: A45C, S49R, N59W, and A74Q.N59W and A74Q face the needle interior and have a different character than the native amino acid.A45C and S49R likely affect inter-subunit interactions and/or interactions of PrgI with other T3SS components.In combination with hilA overexpression and in an optimized medium, all four variants increased secretion titer of two model proteins, the human domain of intersectin (DH) and recombinant human growth hormone (HGH), at least 50% above PrgI WT , indicating gray denotes residues that were not modified in the library design.(C) Substitution with large hydrophobic amino acids increased secretion titer at residues that line the hydrophobic groove of the needle interior.The hydrophobic groove in the needle interior is composed of alternating N55, N59, A73, and A74 from the indicated chains.Residues L66, L69, Q77, and R80 form the charged, raised groove.Residue 70 contributes to the charged, raised groove with its native aspartic acid but also tolerated several amino acid substitutions.A ribbon colored by average secretion score shows the predicted orientations of each amino acid.that the variants were additive with the other improvements (Fig. 9).There was no significant difference among the variants, suggesting that a twofold increase in secretion titer was an upper limit for these variants in combination with other enhancements of secretion titer.

DISCUSSION
Protein engineering relies on either building large libraries of randomly generated variants or small libraries of rationally designed variants.However, as the cost of DNA synthesis and sequencing has fallen, we are now able to construct CCM libraries, which encompass every possible individual mutation across a protein sequence.These libraries combine the advantages of the breadth in sequence space covered by a random mutagenesis library with the depth of a rationally designed approach.While CCM has been applied to both individual proteins (e.g., haloalkane dehalogenase [31]) and proteins which self-assemble into more complex structures (e.g., the MS2 virus like particle [19]), it has been difficult to apply toward proteins involved in the structure and function of living systems.One significant barrier has been the need to incorporate the library into the genome to maintain the native expression and regulation of the gene.With this work, we demonstrate that methods such as λ Red recombineering can be a scalable tactic for quickly incorporating a CCM library directly into the genome.This opens the door to apply the deep mutational scanning and protein fitness landscape framework to new cellular targets, including other multimeric protein structures similar to secretion systems, and even proteins involved in other cellular functions, like signaling or regulation.
One immediate advantage of the CCM approach is to engineer systems within living cells to achieve a desired function.In this work, we discovered many variants with higher secretion scores than PrgI WT and four variants that conferred an ~50% increase in secretion titer above that observed with PrgI WT , even in the context of our optimal engineered system.In contrast, the alanine scan of the S. flexneri T3SS PrgI homolog MxiH was only able to identify seven variants with enhanced secretion titer (15).
The real power in the CCM approach is the information gleaned from the resulting fitness landscapes.This information allows us to generate new hypotheses to deepen our understanding of the protein structure/function relationship, which in turn can inspire new engineering approaches to further our control of the system.In this work, we not only found single mutations which increased secretion titer, but we were also able to analyze which positions were most tolerant of mutations and what kind of amino acid substitutions were allowable to maintain function.These granular data surpass what is possible through conservation studies of protein homologs, for example.By connecting the SFL with the assembled PrgI structure, we discerned that many favorable substitu tions occurred in the C-terminal helix positions which make up the hydrophobic groove in the needle lumen.These positions are highly conserved, so it was not obvious from conservation alone that modifying the hydrophobic groove would not only be accepted but substantially enhance secretion.Even more surprising was a preference for large hydrophobic substitutions in high-secreting variants, a finding that would have been unlikely to be reached using rational design methods.Future analysis of the SFL and the protein structure could further our understanding of the T3SS's assembly and how proteins are secreted through the channel.
The secretion fitness landscape of PrgI does not provide a roadmap for mutations that enhance native T3SS fitness.Rather, it might provide a roadmap for mutations that decrease native T3SS fitness.The native secretion apparatus did not evolve to secrete a maximum amount of each native substrate-secretion titers of native substrates via the T3SS are tightly controlled in its native context (32,33).A recent alanine scan of PrgI revealed that mutations that increased secretion of native effectors produced variable invasion levels of S. enterica Typhimurium into human intestine epithelial cells, indicating that some mutations interrupted the native program of the T3SS (17).The high average secretion scores of cysteine and methionine underline this difference, as the native PrgI lacks cysteine, methionine, histidine, and tryptophan.
Finally, this work highlights an underutilized approach toward engineering secretion systems.Typically, efforts have focused on rewiring regulation to maximize expression of the system (5), modifying the secreted protein cargo to be secretable (34,35), and porting the machinery to other organisms (24,36).Modifications to the structural proteins of the secretion system have mostly been limited to targeted insertions or deletions (16,37).We have shown that even though they are highly conserved, the engineering space for T3SS structural proteins is not limited.This work provides a blueprint for engineering other T3SS proteins and proteins of other secretion systems to further the goal of high titer, low cost of valuable heterologous proteins.

Strains and growth conditions for secretion experiments
Strains, plasmids, and primers used are listed in Tables S1 to S3, respectively.Secretion experiments were started by growing a single colony in the lysogeny broth Lennox formulation (LB-L) (10 g/L tryptone, 5 g/L yeast extract, 5 g/L NaCl) with appropri ate antibiotics (34 µg/mL chloramphenicol for P sicA vectors, 50 µg/mL kanamycin for P lacUV5 hilA) for 12-16 hours overnight in an orbital shaker at 37°C and 225 rpm unless otherwise specified.Overnight cultures were diluted 1:100 into the appropri ate medium supplemented with appropriate antibiotics and 100 µg/mL isopropyl β-D-1-thiogalactopyranoside if the strain carried P lacUV5 hilA.All culturing steps were performed in 5 mL cultures in 24-well deepwell plates (Axygen) unless otherwise specified.Secretion was performed at 37°C and 225 rpm in an orbital shaker for 8 hours unless otherwise specified.The secreted fraction was harvested by centrifuging cultures at 4,000 × g for 10 minutes.Sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) samples for the secretion fraction were prepared by adding supernatant to Laemmli buffer in a 3:1 ratio; SDS-PAGE samples for whole culture lysate were prepared by adding cell suspension to Laemmli buffer in a 1:2 ratio.All SDS-PAGE samples were boiled at 95°C for 5 minutes immediately after preparation.

Protein separation, western blotting, and densitometry
Samples were separated by SDS-PAGE and transferred to a polyvinylidene fluoride membrane (Millipore) for western blotting.If necessary, samples were further diluted in 1× Laemmli buffer such that all band signals were within twofold of the average signal across the blot.Membranes were probed with mouse anti-FLAG per manufactur er's instructions (Sigma Aldrich).A secondary labeling step was performed with goat anti-mouse IgG (H+L) horseradish peroxidase (HRP) conjugate according to manufactur er's instructions (Thermo Fisher) to facilitate chemiluminescent detection.Bands were detected with the SuperSignal West Pico Plus or SuperSignal West Femto (for the P41 mutants) substrates (Thermo Fisher) and a ChemiDoc XRS+ imaging system (Bio-Rad).
All relative protein quantities from western blotting were calculated by performing densitometry using ImageJ or Image Lab software (Bio-Rad) and normalizing to the average of the replicates of the PrgI WT samples.Relative protein amounts were corrected for dilution if appropriate.Error bars are standard deviation on three biological replicates unless otherwise specified (P41 mutant studies used four biological replicates).

PCR and cloning
Primers used in this study are listed in Table S3.Polymerase chain reaction (PCR) was performed with Phusion DNA polymerase for Quikchanges and constructing parts for recombineering.Saturation mutagenesis at position 41 was performed by introducing mutations to prgI carried on a P lacUV5 -inducible plasmid with a Quikchange protocol.Mutations were confirmed by Sanger sequencing.Double-stranded DNA fragments for recombineering contained the replacement gene(s) flanked 5′ and 3′ by 40 base pairs (bp) of homology to the genetic locus at which the replacement gene(s) should be inserted.The 40 bp of homology was included in oligos and attached via PCR using the primers listed in Table S3.The catG-sacB cassette was amplified from the purified genome of Escherichia coli TUC01; PrgI position 41 mutants were amplified from the appropriate P lacUV5 -inducible plasmid; and sptP-phoA-2×FLAG-6×His was amplified from a P sicA sicP sptP-phoA-2×FLAG-6×His secretion plasmid.Colony PCR was performed by diluting a colony in a 50 µL PCR mixture containing the appropriate primers and amplifying with GoTaq polymerase.Correct sequences were confirmed by Sanger sequencing.

Strain construction for single genomic modifications
Strain modifications were generated by λ Red recombineering as described by Thoma son et al. (25).Briefly, a colony of ASTE13 carrying the pSIM6 plasmid was inoculated in LB-L with 30 µg/mL carbenicillin and grown at 30°C and 225 rpm for 16-20 hours.The overnight culture was diluted 1:70 into 35 mL of LB-L and grown at 30°C until OD 600 reached 0.4-0.6.The culture was washed twice with 30 mL ice-cold sterile ddH 2 O and centrifugation at 4,600 × g for 3 minutes to collect the cells.After the second wash, cells were resuspended in ~400 µL of ice-cold sterile ddH 2 O. Aliquots of 50 µL resuspended cells were mixed with 200 ng of the appropriate PCR fragment and electroporated at 1,800 V for 5 ms.A negative control containing no added DNA was also electroporated.Cells were mixed with 950 µL Super Optimal broth with Catabolite repression (SOC) medium immediately after electroporation, and either recovered at 30°C for an hour for cat-sacB cassette introduction (first step of recombineering) or transferred to a test tube containing 9 mL of LB-L and grown at 37°C and 225 rpm for 4 hours for cat-sacB removal and replacement (second step of recombineering).Cells were diluted serially to 10 −3 in sterile phosphate buffered saline (PBS).Two hundred microliters of diluted cells was plated on 6% sucrose agar and grown at 37°C overnight.The second step of recombineering for the ∆invA knockout replaced the cat-sacB cassette by electroporating 1 µL of a 10 µM solution of a single 60 bp oligo containing the first and last 30 bp of the invA gene.

Library construction
A library of gene blocks carrying all possible amino acid substitutions was synthesized and pooled by Twist Biosciences.Codons were fully randomized ("NNN, " meaning any nucleotide at all three codon positions), but the library excluded wild-type residues and stop codons.Residues 1-6 and 76-80 were not modified.The lyophilized DNA from Twist Biosciences was reconstituted in ultrafiltered water to a concentration of 200 ng/µL.Recombineering was performed with an ASTE13 sptP::sptP(1-167)-phoA-2×FLAG-6×His prgI::cat-sacB as described above with the following modifications: 200 ng (4 µL of a 50 ng/µL resuspended solution) of the library was transformed into 100 µL of recom bination-competent cells via electroporation at 1,800 V and 5 ms.A negative control containing no added DNA was also electroporated.Cells were immediately mixed with 900 µL SOC medium and transferred to a 14 mL disposable culture tube (Fisherbrand) containing 2 mL of LB-L for a 4-hour recovery at 37°C and 225 rpm.Recombination efficiency was assessed by plating 200 µL of cells diluted serially to 10 −3 in sterile PBS from both the library and the negative control on 6% sucrose agar and allowing colonies to develop at room temperature for 24 hours.The remainder of the culture was mixed with 60% glycerol in a 1:3 ratio and aliquoted into three cryovials for storage at −80°C.Before storage, 2 × 33 µL aliquots of the glycerol mixture were diluted to facilitate plating single colonies for screening.The first aliquot was diluted in 1.2 mL sterile PBS, and the second aliquot was diluted in 1.2 mL PBS with 15% glycerol and frozen at −80°C.The 1.2 mL aliquot without glycerol was further split into 3 × 400 µL aliquots, and each was plated on a 15 cm agar plate with 6% sucrose LB agar.Colonies developed for 24 hours at room temperature.

Alkaline phosphatase activity
Alkaline phosphatase activity was measured by monitoring p-nitrophenol phosphate (pNPP, Sigma) cleavage.A stock solution of 0.1 M pNPP prepared in 1 M Tris, pH 8.0, was thawed from −20°C and diluted to 0.01 M in 1 M Tris, pH 8.0.Twenty microliters of the secretion fraction was added to 140 µL of 1 M Tris, pH 8.0, and 40 µL of the 0.01 M pNPP solution was added to each well.Alkaline phosphatase activity was measured on a BioTek Synergy HTX plate reader by monitoring absorbance at 405 nm at 37°C for 1 hour, taking measurements each minute.

Sample preparation for high-throughput sequencing and data processing
Detailed methods of sample preparation, high-throughput sequencing, and subsequent data processing are available in the supplemental material.

FIG 2
FIG 2 PrgI variants can be screened for activity.(A) Schematic of the library generation and screening methods for assessing PrgI variants.Genes encoding PrgI variants were cloned into the prgI locus in a strain containing an SptP-tagged PhoA (alkaline phosphatase [AP]) reporter at the sptP locus.Variants were then assayed for enzyme activity as a proxy for secretion titer.(B) SptP-AP-2×FLAG-6×His was secreted from ASTE13 sptP::sptP(1-167)-phoA-2×FLAG-6×His and (Continued on next page)

FIG 4
FIG 4Weighted-average secretion scores for all single amino acid variants of PrgI.The PrgI library was constructed and screened for relative secretion titer using the alkaline phosphatase (AP) activity assay.Variants were sorted into 10 pools according to relative secretion titer and sequenced.If variants appeared in multiple pools, a weighted average secretion score was used.Wild-type residues not present in the library are indicated by hatches.Gray boxes denote variants that did not appear in any sequenced pools, i.e., those variants were not screened.Dark teal indicates higher secretion titer, while dark pink indicates no secretion.

FIG 5
FIG5 Weighted-average secretion scores predict relative secretion titer.All clones from the highest secreting pool and 30 random clones from the remaining pools were patched onto LB agar plates from the reformatted glycerol stocks sorted by pool (see supplemental methods for details on reformatting).Patched colonies were inoculated for secretion titer measurement, which followed the same workflow as library screening.Clones were Sanger sequenced, and the secretion score was plotted against the newly measured secretion titer (A).Error bars represent standard error of three biological replicates.Clones from the highest secreting pool were plotted separately to highlight differences (B).WT-level secretion titer is indicated by a dotted line.PrgI P41A (stripes) was included for comparison as the previous best secreting variant.Error bars represent one standard deviation to allow direct comparison among variants.

FIG 6
FIG 6 Trends in secretion score, conservation, and buried surface area for PrgI variants.Relative exposed surface area of PrgI residues (high in blue, low in red) (left), relative conservation of PrgI residues (conserved in purple, variable in white) (middle), and mean secretion score for PrgI variants (high secretion in teal, low secretion in pink, WT secretion in white) (right).Immutable residues are highlighted with dashed lines.

FIG 7 FIG 8
FIG 7 Trends in amino acid property and secretion scores.Amino acids were grouped by the physical properties shown on the x-axis (bottom), and the secretion scores of each group at each position were averaged to calculate a property secretion score.Low secretion is shown in pink, high secretion is shown in teal, WT-like secretion is shown in white.

FIG 9
FIG9 High secreting PrgI variants enable increased secretion of model proteins.The WT strain along with four of the top secreting variants from the PrgI library were transformed with the hilA overexpression plasmid and a plasmid expressing either SptP-DH or SptP-HGH.Each strain was grown in an optimal growth medium.Secretion samples were collected after 8 hours, and relative secretion titer was measured via semi-quantitative western blot.Error bars represent the standard error of three biological replicates.